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The Shannon Lower Bound 
is Asymptotically Tight 

Tobias Koch, Member, IEEE 


Abstract 

The Shannon lower bound is one of the few lower bounds on the rate-distortion function that holds for a large 
class of sources. In this paper, it is demonstrated that its gap to the rate-distortion function vanishes as the allowed 
distortion tends to zero for all sources having a finite differential entropy and whose integer part is finite. Conversely, 
it is demonstrated that if the integer part of the source has an infinite entropy, then its rate-distortion function is 
infinite for every finite distortion. Consequently, the Shannon lower bound provides an asymptotically tight bound on 
the rate-distortion function if, and only if, the integer part of the source has a finite entropy. 

Index Terms 

Rate-distortion theory, Renyi information dimension. Shannon lower bound. 

I. Introduction 

Suppose that we wish to quantize a memoryless, d-dimensional source with a distortion not larger than D. More 

specifically, suppose a source produces the sequence of independent and identically distributed (i.i.d.), d-dimensional, 

real-valued, random vectors {Xfe, fc C Z} according to the distribution Px, and suppose that we employ a vector 

quantizer that produces a sequence of reconstruction vectors {Xfc, fc € Z} satisfying 

_ 1 ^ 

lim - V E 

n—¥C!0 fi ^ 

/c=l 

for some norm || ■ || and some r > 0. (We use lim to denote the limit superior and lim to denote the limit inferior.) 
Rate-distortion theory states that if for every blocklength n and distortion constraint D we quantize the sequence 
of source vectors Xi,..., X„ to one of possible sequences of reconstruction vectors Xi,..., X„, then the 
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smallest rate R{D) (in nats per source symbol) for which there exists a vector quantizer satisfying (1) is given by 

[1], [2] 

R{D)= inf /(X;X) (2) 

E[||x-xr]<_D 

where the infimum is over all conditional distributions of X given X for which 


X-X 


< D 


( 3 ) 


and where the expectation in (3) is computed with respect to the joint distribution ^x^x|x- Here and throughout 
the paper we omit the time indices where they are immaterial. The rate R{D) as a function of D is referred to as 
the rate-distortion function. 

Unfortunately, the rate-distortion function is unknown except in a few special cases. It therefore needs to be 
assessed by means of upper and lower bounds. Arguably, for sources with a finite differential entropy, the most 
important lower bound is the Shannon lower bound [1], [2], which for a d-dimensional, real-valued source and the 
distortion constraint (3) is given by [3] 

Rslb{D) = hiX) -p ^ log 1 - ^ log (^(^^dr(l + d/r)Y'\) . (4) 

Here log(-) denotes the natural logarithm, Vd denotes the volume of the d-dimensional unit ball {x £ : ||x|| < 1}, 

and r( ) denotes the Gamma function. While this lower bound is tight only for some special sources, it converges 
to the rate-distortion function as the allowed distortion D tends to zero, provided that the source satisfies some 
regularity conditions; see, e.g., [4]-[7]. A finite-blocklength refinement of the Shannon lower bound has recently 
been given by Kostina [8], [9]. 

To the best of our knowledge, the most general proof of the asymptotic tightness of the Shannon lower bound 
is due to Linder and Zamir [7]. While Linder and Zamir considered more general distortion measures, specialized 
to the norm-based distortion (3), they showed the following. 

Theorem 1 (Linder and Zamir [7, Cor. 1]): Suppose that X has a probability density function (pdf) and that 
h(X.) is finite. Assume further that there exists an a > 0 such that E[||X||“] < oo. Then the Shannon lower bound 
is asymptotically tight, i.e., 

lim{i?(D)-i?sLB(i9)} =0. (5) 

Proof: See [7]. ■ 

The theorem’s conditions are very mild and satisfied by the most common source distributions. In fact. Theorem 1 
demonstrates that the Shannon lower bound provides a good approximation of the rate-distortion function for small 
distortions even if there exists no quantizer with a finite number of codevectors and of finite distortion, i.e., when 

E[||X||’'] = c». However, the theorem’s conditions are more stringent than the ones sometimes encountered in 

analyses of the rate and distortion redundancies of high-resolution quantizers. This is relevant because the Shannon 
lower bound is often used as a benchmark against which the performance of such quantizers is measured. 


February 23, 2016 


DRAFT 




3 


For example, Gish and Pierce [10] studied the smallest output entropy that can be achieved via scalar quantization 
with given expected quadratic distortion, i.e.. 


R,{D) = inf H{q{X)) 

q-. E[{X-q{X)Y\<D ^ " 


(6) 


where the infimum is over all deterministic mappings q{-) from the source alphabet X to some (countable) 
reconstruction alphabet X satisfying E [(X — ( 7 (X))^] < D. For one-dimensional sources that have a pdf satisfying 
some continuity and decay constraints, they showed that the asymptotic excess rate is given by 


]m{W) - R{D)} = bog^. 

D4.0 ^ D 


( 7 ) 


They further showed that this excess rate can be achieved by a uniform quantizer, hence the well-known result that 
“uniform quantizers are asymptotically optimal as the allowed distortion tends to zero.” Since the rate-distortion 
function R{D) is in general unknown, they showed instead that 


lim{i?s(£))-i?sLB(77)} = ilog^. (8) 

DiO ^ o 

This is equivalent to (7) whenever the Shannon lower bound is asymptotically tight. A dual formulation of (7) was 
given by Zador [11] as the smallest asymptotic excess distortion with respect to the distortion-rate function as the 
rate tends to infinity. While Zador’s original derivation was flawed, a rigorous proof of the same result was given 
by Gray, Linder, and Li [12]. In their work, they consider d-dimensional source vectors X that have a pdf, whose 
differential entropy is finite, and that satisfy 


H{[X\) < oo. 


( 9 ) 


Here [aj, a = {ai,... ,ad) £ denotes the d-dimensional vector with components [oij,..., [a^J, and [aj, 
a £ M denotes the integer part of a, i.e., the largest integer not larger than a. In words, condition (9) demands 
that quantizing the source with a cubic lattice quantizer of unit-volume cells gives rise to a discrete random vector 
of finite entropy. This ensures that the quantizer output can be further compressed using a lossless variable-length 
code of finite expected length. Koch and Vazquez-Vilar [13] recently demonstrated that these assumptions are also 
sufficient to recover Gish and Pierce’s result (7). 

As we shall argue below, (9) is weaker than the assumption E[||X||“] < oo required in Theorem 1 for the 
asymptotic tightness of the Shannon lower bound. One may thus wonder whether there are sources for which the 
performance of high-resolution quantizers can be evaluated but the Shannon lower bound does not constitute a 
relevant performance benchmark. In this paper, we demonstrate that this is not the case. We show that for sources 
that have a pdf and whose differential entropy is finite, the Shannon lower bound (4) is asymptotically tight if 
(9) is satisfied. Conversely, we demonstrate that for sources that do not satisfy (9), the rate-distortion function is 
infinite for any finite distortion. Hence, condition (9) is necessary and sufficient for the asymptotic tightness of the 
Shannon lower bound. 

The quantity i7([XJ) in (9) is intimately related with the Renyi information dimension [14], defined as 

d(X) = lim ^ ^ /to) ^ exists (10) 

m^oo logm 
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which in turn coincides with the rate-distortion dimension introduced by Kawabata and Dembo [15]; see also 
[16]. Generalizing Proposition 1 in [16] to the vector case, it can be shown that the Renyi information dimension 
is finite if, and only if, (9) is satisfied and that a sufficient condition for finite Renyi information dimension is 
E[log(l + ||X||)] < oo, which in turn holds for any source vector for which E[||X||“] < oo for some a > 0. Thus, 
(9) is indeed weaker than the assumption that E[||X||“] < oo. 

It is common to assume that the differential entropy of the source is finite, since otherwise the Shannon lower 
bound (4) is uninteresting. We next briefly discuss how (9) and the assumption of a finite differential entropy are 
related. As demonstrated, e.g., in the proof of Theorem 3 in [17], a finite iT([XJ) implies that /i(X) < oo. In fact, 
one can show that if (9) holds and the random vector X has a pdf, then h(X.) < i7([XJ) [18, Cor. 1]. Conversely, 
one can And sources for which the differential entropy is finite but i7([XJ) is infinite. For example, consider a 
one-dimensional source with pdf 


OO 


( 11 ) 


where 


Pm — 


Km log^ m ’ 


m = 2,3,... 


’< =E 


1 


^ m log m 

m—2 ® 

and !{■} denotes the indicator function. It is easy to check that for such a source 


H{IX\) = Pmlog — = 


Pm 


and 


p OO 

h{X) = - / fx{x)\ogfx{x)Ax = Y 

JR _n 


log K + log m + 2 log log m 
Km log^ m 

log K + 2 log log m 


= oo 


< oo. 


(12a) 

(12b) 

(13) 

(14) 


„ Km log m 

(See remark after Theorem 1 in [14, pp. 197-198].) Thus, for sources satisfying /i(X) > —oo, a finite i7([XJ) 
implies a finite differential entropy but not vice versa. 


II. Problem Setup and Main Result 

We consider a d-dimensional, real-valued source X with support A” C whose distribution is absolutely 
continuous with respect to the Lebesgue measure, and we denote its pdf by /x. We further assume that x h-4 
/x(x) log/x(x) is integrable, ensuring that the differential entropy 

h{X) = - f /x(x)log/x(x)dx (15) 

Jx 

is well-defined and finite. We have the following result. 

Theorem 2 (Main Result): Suppose that the d-dimensional, real-valued source X has a pdf and that d(X) is 
finite. If id([XJ) < oo, then the Shannon lower bound is asymptotically tight, i.e., 

\im{RiD) - Rslb{D)} = 0. (16) 
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Conversely, if i/([XJ) = cxd, then R{D) = oo for every D > 0. 

Proof: See Section III. ■ 

Thus, Theorem 2 demonstrates that the Shannon lower bound is asymptotically tight if, and only if, iT([XJ) is 
hnite. 

In all fairness, we should mention that Linder and Zamir presented conditions for the asymptotic tightness of 
the Shannon lower bound that are weaker than the ones presented in Theorem 1; see [7, Th. 1]. Specihcally, they 
showed that the Shannon lower bound is asymptotically tight if X has a pdf, if h(X) is hnite, and if there exists 
a function 5: > [0,cx)) satisfying the following: 

(i) The equations 


a{D) [ dx = 1 

i{D) f dx = D 


(17a) 

(17b) 


have a unique pair of solutions {a{D), s{D)) for all D > 0. Moreover, a{D) and s{D) are continuous functions 
of D. 

(ii) Let WD be a random vector with pdf x Then Wd 0 as 77 —0, where we use “=J>” 

to denote convergence in distribution and 0 denotes the all-zero vector. 

(iii) Let be a random vector that is independent of X and that has the pdf 


,/zd(z) = - 


1 


Vdr{d/r)D^r 




z G 


(18) 


Then S{-) satishes 0 < E[(5(X)] < oo and E[J(X + Z^)] tends to E[i5(X)] as D tends to zero. 

It is unclear whether there exists a function (5(-) with the above properties that allows us to prove the asymptotic 
tightness of the Shannon lower bound for all source vectors X satisfying i7([XJ) < oo and |h(X)| < oo. In 
fact, even if there existed such a function, proving that it satishes the required conditions may be complicated. 
Fortunately, the existence of such a function is not essential. Indeed, the proof of Theorem 2 follows closely the 
proof of Theorem 1 in [7] but avoids the use of 5{-). 


III. Proof of Theorem 2 

The proof consists of two parts. In the hrst part, we show that if i7([XJ) < oo, then the Shannon lower bound 
is asymptotically tight (Section III-A). In the second part, we show that if iL([XJ) = oo, then R{D) = oo for 
every D > 0 (Section III-B). 


A. Asymptotic Tightness 

In this section, we demonstrate the asymptotic tightness of the Shannon lower bound Rslb{D) for sources that 
satisfy i7([XJ) < oo and |/i(X)| < oo. The hrst steps in our proof are identical to the ones in the proof of 
Theorem 1 in [7]. To keep this paper self-contained, we reproduce all the steps. 
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To prove asymptotic tightness of Rslb{D), we derive an upper bound on R{D) whose gap to Rslb{D) vanishes 
as D tends to zero. In view of (2), an upper bound on R{D) follows by choosing X = X + Zjj, where is 
a d-dimensional, real-valued, random vector that is independent of X and has pdf (18). It can be shown that 
satisfies E[||Zd||’'] = D; see, e.g., [3, Sec. VI]. It follows that 

RiD) <I{X-,X + Zd) 

= h{X + ZD)-h{ZD). (19) 

Furthermore, by evaluating h{Z]j) and comparing the result with (4), we have 

Rslb{D) = h{X) - hiZo). (20) 


Combining (19) and (20) gives 


0 < R{D) - Rslb{D) < h{X + Zd) - h{X). 


( 21 ) 


Thus, asymptotic tightness of the Shannon lower bound follows by proving that 


lim/i(X-|-ZD) </i(X). (22) 

nio 

To this end, we follow the steps (17)-(21) in [7] but with V^(_d) and Ta(o) there replaced by the random vectors 
Y o and Yo having the respective pdfs 


It follows that 


/Y.(y) = + =0i{LyJ yeK" 

ieZ'i 

/Yo(y) = E = i)i{LyJ = d: y e 

D{fx+zn\\No)=HilX + Zo\)-hiX + ZD) 


(23a) 

(23b) 


(24) 


and 

i9(/x||/Yo) = i?(LXJ)-MX) (25) 

where D{f\\g) denotes the relative entropy between the pdfs / and g [19, Eq. (9.46)]. The random vector Zjy has 
the same pdf as D^RZ\, where Z\ denotes Z^ for 0 = 1. Consequently, Zu 0 almost surely as D tends to 
zero and, hence, also in distribution. Since X and Zd are independent, it follows that X + Zd ^ X as D tends to 
zero. Furthermore, since the distribution of X is absolutely continuous with respect to the Lebesgue measure and 
the set is countable, the probability Pr(X G Z'^) is zero, so [20, Th. 2.8.1, p. 122] 

Hm Pr( [X -f Zd\ = i) = Pr( [XJ = i), i G K'^. (26) 

We thus conclude that /yd converges pointwise to /yq, which by Scheffe’s lemma [21, Th. 16.12] implies that 
Yd => Yq as O tends to zero. 
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By the lower semicontinuity of relative entropy (see, e.g., the proof of Lemma 4 in [22] and references therein), 
it follows that 

limi:)(/x+ZDl|/Yi3) > £>(/x||/yo)- (27) 

154.0 

Together with (24) and (25), this yields 

\jm{H{l-K + ZD\) - HX+Zd)} > H{IX\) - h{X). (28) 

D4.0 

Since iT([XJ) < oo and |ft.(X)| < oo, the claim (22) follows from (28) by showing that iJ([X + Z^J) tends to 
H ([XJ) as D tends to zero. To this end, we need the following lemma, which we state in its most general form 
since it may be of independent interest. 

Lemma 1: Let X and Z be independent d-dimensional random vectors. Assume that E[||Z||''] < oo. 

(i) If H ([XJ) = oo, then iJ([X + eZJ) = cx) for every e > 0. 

(ii) If iT([XJ) < oo and Pr(X € Z'^) = 0, then 

limid([X + eZJ) = iT([XJ). (29) 

e4,0 

Proof: See appendix. ■ 

The random vector Zd is independent of X and has the same pdf as D^^'^Zi, where Zi satisfies E[||Zi||''] = 1. 
Furthermore, by assumption, iT([XJ) < oo and Pr(X S if) = 0 (since X has a pdf and if is countable). It thus 
follows from Part (ii) of Lemma 1 that 

hm id ([X + Z,,]) = hm id ([X + diV-Zi]) = ii ([XJ). (30) 

Combining (30) with (28) yields (22), which in turn demonstrates that the Shannon lower bound is asymptotically 
tight if ii(Lxj) < c» and |i(X)| < oo. This proves the first part of Theorem 2. 

B. Infinite Rate-Distortion Function 

To prove that ii([XJ) = oo implies R{D) = oo for every id > 0, we show that i(X; X) = cxd for every pair 
of random vectors (X,X) satisfying (3) and ii([XJ) = cx). To this end, we follow along the lines of the proof 
of Theorem 6 in [18, App. A]. Indeed, it follows from the data processing inequality [23, Cor. 7.16] that for any 
arbitrary T > 0 

i(X;X) >i(pT(LXJ);LXJ) (31) 

where the function gy.Mf^ [—"f, clips its argument to the hypercube [—T, T]'^, i.e., 

pr(x) = max{min{x, Y}, —T}, x S K."^. (32) 

In (32), Y denotes the d-dimensional vector (T,..., T), and max{-, •} and min{-, •} denote the component-wise 
maximum and minimum, respectively. Since ii(px(LXJ)) is finite, the mutual information on the right-hand side 
(RHS) of (31) can be written in the form 

i(pT(LXJ); [XJ) = ii(ffT([XJ)) - ii(5T([XJ) I [XJ) (33) 
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which is well-defined. 

We first show that the second entropy on the RHS of (33) is bounded in T for all pairs of vectors (X, X) 
satisfying (3). Using basic properties of entropy together with the fact that the entropy of a function of a random 
variable is less than or equal to the entropy of the random variable itself [19, Ex. 5, p. 43], we obtain 


i/(5T(Lxj) I LXJ) <i/(Lxj I LXJ) 

<ij(Lx-xj)+if(Lxj I Lxj,Lx-xj). 


(34) 


Since E 


log(l + ||X-X||) < oo for all (X,X) satisfying (3), generalizing Proposition 1 in [16] to the vector 


case yields that 


H{[X-±\) < 


OO. 


(35) 


Furthermore, denoting Y = X — X, we obtain 

H{[X\ I LXJ,LX-XJ) =iJ(LX + YJ I LXJ,[YJ) 

<dlog2 (36) 


since, conditioned on [XJ and [YJ, each component of [X 4- YJ can only take on the values [X^j + [Y^J and 
[Xil + [Yi\ + 1 (see also the proof of Proposition 8 in [18]). Combining (34)-(36) yields 

supi7(5T(LXJ) I LXJ) <oo. (37) 

T>0 

We next show that if H ([XJ) = oo, then 


{grUX])) = oo. 


(38) 


Since T > 0 is arbitrary, it then follows from (31) and (33) that 


/(X;X) > lim^{H{gr{[X\)) - H{grilX\) \ [X])} (39) 

which by (37) and (38) is infinite. Hence, /(X;X) = oo for every pair of random vectors (X, X) satisfying (3) 
and i7(LXJ) = oo, which implies that the rate-distortion function R{D) is infinite for every D > 0. 

To prove (38), we note that 


H{gr{[X\)) > ^Pr([XJ =i)log-— 
ifzjd J 


(40) 


since PrLpxLLXJ) = i) log(l/Pr(pT(LXJ) = i)) > 0 for i ^ (-T,T)'^ and Pr(pT(LXJ) = i) = Pr(LXJ = i) for 
i C (—T,T)'^. The claim thus follows from Patou’s lemma [20, Th. 1.6.8, p. 50] and because l|i C (— 
converges pointwise to G as T —>■ oo: 


fim H{gri[X\)) > H{[X\) 


T —>oo 

This proves the second part of Theorem 2. 


= oo. 


(41) 
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IV. Conclusions 

The Shannon lower bound is one of the few lower bounds on the rate-distortion function that hold for a large class 
of sources. We have demonstrated that this lower bound is asymptotically tight as the allowed distortion vanishes 
for all sources having a finite differential entropy and a finite Renyi information dimension. Conversely, we have 
demonstrated that if the source has an infinite Renyi information dimension, then the rate-distortion function is 
infinite for any finite distortion. 

Assuming a finite Renyi information dimension is tantamount to assuming that quantizing the source with a cubic 
lattice quantizer of unit-volume cells gives rise to a discrete random vector of finite entropy. The latter assumption 
is natural in rate-distortion theory and often encountered. To this effect, we have demonstrated that this assumption 
is not only natural, but it is also a necessary and sufficient condition for the asymptotic tightness of the Shannon 
lower bound. 

For ease of exposition, we have only considered norm-based difference distortion measures, which is less general 
than the distortion measures studied, e.g., by Linder and Zamir [7]. While our analysis could be generalized to 
more general distortion measures, we have refrained from doing so, because we believe that it would obscure the 
analysis without offering much more insight. 


Appendix 

A. Proof of Lemma 1: Part (i) 

We shall show by contradiction that if H ([XJ) = oo, then 77 ([X + eZJ) = oo for every e > 0. So let us assume 
that 77([XJ) = oo but that there exists an e > 0 such that 77( [X -)- eZJ) < oo. It then follows that, for any arbitrary 
T > 0, the difference 77([X-|-eZJ) — 77([X-|-eZJ | gr{ [XJ)) is well-defined and equal to 7([X-beZJ;px([XJ)). 
(The function gr{-) has been defined in (32).) Consequently, by the nonnegativity of entropy, 

77(LX + eZJ) >7(LX + eZJ;pT(LXJ)). (42) 

Furthermore, 77(px(LXJ)) is finite, so the mutual information on the RHS of (42) can also be written as 

7(LX-beZJ;pT(LXJ)) =77 (pt(LXJ))-77(5t(LXJ) | [X + eZJ). (43) 

We next show that 

sup77((?t(LXJ) I [X + eZJ) < oo. (44) 

T>0 

To this end, we follow the steps (34)-(36) in Section III-B. Indeed, as in (34), it can be shown that 

77(pt(LXJ) I [X + eZJ) <77(LeZJ)+77(LXJ | [X + eZJ, [eZJ). (45) 

Generalizing Proposition 1 in [16] to the vector case then yields that the first entropy on the RHS of (45) is finite, 
since the lemma’s assumption E[||Z||’'] < oo implies that E[log(l -f ||eZ||)] < oo. Moreover, following the steps in 
(36), the second entropy on the RHS of (45) can be upper-bounded by 

77([XJ I [X-beZJ, [eZJ) <dlog2. (46) 
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The claim (44) thus follows. 

Since T > 0 is arbitrary, (42) and (43) give 

7T(LX + eZJ) > li^ {7T(5 t(LXJ))- iT(5T(LXJ) | [X + eZJ)}. (47) 

1 —¥(X> 

However, if i7([XJ) = oo then, by (38) and (44), the RHS of (47) is infinite, which contradicts the assumption 
that there exists an e > 0 such that iJ([X + eZJ) < oo. This proves Part (i) of Lemma 1. 


B. Proof of Lemma 1: Part (ii) 

Using basic properties of entropy, we obtain 

i7(LX + eZJ) <i7([XJ)+i7([X + eZJ | [XJ) 

<iT(LXJ)+i7(V,) (48) 

and 


iT(LX + eZJ)>/7(LXJ)-77(LXJ | (X + eZJ) 

>iT(LXJ)-i7(V,) (49) 

where we define Vg = [X + eZJ — [XJ. Note that Ve can also be written as = [X + eZJ, where X = X — [XJ. 

In view of (48) and (49), Part (ii) of Lemma 1 follows by showing that i7(Ve) vanishes as e tends to zero. We 
begin by writing this entropy as (see, e.g., [9, Eq. (81)]) 

H{V,) = h{[X + eZ\+lJ) (50) 

where U is a d-dimensional random vector that is uniformly distributed over the hypercube [0, and that is 
independent of (X, Z). We next show that 

lim/i([X + eZJ+U) =/i(U). (51) 

The differential entropy of U is zero, so (50) and (51) demonstrate that 77 (V^) vanishes as e tends to zero, which 
in turn proves Part (ii) of Lemma 1. 

Since conditioning reduces entropy [19, Sec. 9.6], we have 

/i([X + eZJ+U) >/i(U). (52) 


To prove (51), it thus remains to show that 


lim/i([X + eZJ + U) < h{V). 


(53) 


To this end, we follow along the lines of the proof of Theorem 1 in [7] (see also the proof of Lemma 6.9 in [24]). 
Let the random vectors Y,; and Yq have the respective pdfs 

1 _ 


/v.(y) = (J) 
/Y.(y) = © 


VdT{d/r)aI 

-1 1 


U,r(d/r)E[||U|| 


y e 


.e-r-E[||u|ri llyir^ 


(54a) 

(54b) 
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where 

a,4E[||LX + eZJ+U|r]. (55) 

It follows that 

^(/[x+eZj+u II /yJ = - + log ((r/dld/r-l ^ + “ ^og-/i([X + eZJ + U) (56) 

and 

II /yJ = ^ + log ^ logE[l|U|ri - MU). (57) 

As we shall argue next, the pdf of [X + eZJ + U converges pointwise to the pdf of U as e tends to zero, so by 
Scheffe’s lemma [X + eZJ + U U as e tends to zero. Indeed, the pdf of [X + eZJ + U is given by 

/Lx+gzj+u(x)= ^Pr(LX + eZJ =i)l{LxJ =i}, x e (58) 

iGZ'^ 

Since E[||Z||’'] < oo, we have that eZ —0 almost surely as e tends to zero, which implies that eZ ^ 0 as e 
tends to zero. Furthermore, the independence of X and Z implies that X + eZ => X as e tends to zero. Since by 
assumption Pr(X S Z'^) = 0, it follows that the probability Pr(X G lA) is zero, so [20, Th. 2.8.1, p. 122] 

limPr([X + eZJ = i) = Pr([XJ = i) = lL{i = 0} (59) 

eiO 

where the last step follows because, by dehnition, [XJ = 0 almost surely. Applying (59) to (58), and noting that 
/u(u) = ll{[uj = 0}, u G R'^, the claim that /[x+ezj converges pointwise to /u as e tends to zero follows. 
We next show that 

hmE[||LX + eZJ+U|r] =E[||U|r]. (60) 

To this end, we hrst note that, by the continuity of norms, and because the function x i->- [xJ is continuous for 

X ^1, 

lim||LX + ezJ+u|r = ||LxJ+u|r = ||u|r ( 61 ) 

e4-0 

for every z G R'^, u G [0,1)'^, and for x G (0,1)'^. Furthermore, since on a hnite-dimensional vector space any two 
norms are within a constant factor of one another [25, p. 273], we have 

c II [x + ezJ + u|| 1 < II [x + ezJ + u|| < c || [x + ezJ + u|| i (62) 

for some constants c > c > 0, where ||z||i = |zi| + ... + \zd\, z = {zi,... ,Zd) G M'* denotes the Li-norm. It thus 
follows that, for every 0 < e < 1, 

II [x + ezJ + u|r < cHI [x + ezJ + u\\l 
<cni|z|+3|K 
<c^(||z||i + ||3||i)'' 

< |(||z|| + ||3||)’' (63) 


February 23, 2016 


DRAFT 




12 


where 3 denotes the d-dimensional vector (3,..., 3). Here the first step follows from (62); the second step follows 
because |[xj| < |a:| -I- 1, x G M and because every component of x and u satisfies 0 < < 1; the third step 

follows from the triangle inequality; and the last step follows again from (62). 

The lemma’s assumptions E[||Z||’'] < cx) and Pr(X G = 0 imply that 

E[(||Z|| + ||3||)n <oo (64) 


and 


Pr(X G (0,1)'^) = 1. 


(65) 


Consequently, (60) follows from (61) and the dominated convergence theorem [20, Th. 1.6.9, p. 50]: 


lim E 

eiO 


X + eZJ +UII’-] 


= E 


lim||LX + eZJ+U|r 

e4.0 


E[iiuir]. 


( 66 ) 


Since /y is a continuous function of (60) implies that converges pointwise to as e tends to zero, so 
by Scheffe’s lemma => Yq as e tends to zero. 

We conclude that [X + eZJ -|- U U and Y^ ^ Yq as e tends to zero, so the lower semicontinuity of relative 
entropy gives 

lnn^(/Lx+.zj+u || /yJ > ^(/u || /yo) • (67) 

€ 4^0 

Combining (67) with (56) and (57), and using that, by (60), ctj E[||U||’'] as e tends to zero, it follows that 


lim/r([X + eZJ+U) </r(U). (68) 

e |.0 

This proves Part (ii) of Lemma 1. 
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